Large language models, behaviour and cognition: Making sense of the new black boxes with old tricks
Bennett Kleinberg
Tilburg University & University College London
IMT Lucca, 26 September 2024
A new black box?
Remarkable potential (Kaddour et al. 2023) and danger (Mozes et al. 2023):
Before large language models:
The GPT-3 approach:
What we know (Brown et al. 2020)
| Dataset | Tokens (billion) | Weight |
|---|---|---|
| Common Crawl | 410 | 60% |
| Books 1 and 2 | 67 | 16% |
| Web Text 2 | 19 | 22% |
| Wikipedia | 3 | 3% |
LLMs are autoregressive language models
Consider the following examples:
My name is ________
My name is Bond. ________
My name is Bond. James ________
\(P(x_n | x_1, x_2, ..., x_{n-1})\)
Core idea: prompts = conditioning of the autoregressive probability function
What makes this difficult so to investigate then?
A statistical model (here: linear regression):
\(y_i = \alpha + \beta_1x_{i1} + \epsilon_i\)
This model has two parameters: \(\alpha\) and \(\beta_1\)
But LLMs have many parameters.
GPT-3:
Problem 1:
New AI models are quickly adopted (too quickly?) with
Problem 2:
Analytical approaches inevitably unfeasible:
We already know how to study black boxes.
Old tricks for a new black box
One rooted in behaviourism.
The other is rooted more in modern cognitive science.
Let’s take a look back some decades
Central argument:
Behave is what organisms do.
Psychology as a “pure branch of the natural sciences” (Watson 1913)
\(\rightarrow\) Focus on the observable (Skinner 1935)
Learning: change in behaviour based on experience
Also called: respondent conditioning
Builds on classical conditioning.
But: focuses on how consequences affect voluntary behaviour
Also called instrumental conditioning
Known as the Skinner box:
Behaviourism studies psychological events in terms of behavioural criteria.
\(\rightarrow\) mental states are deemed irrelevant
… this was a successful and dominant paradigm in psychology
So what happened?
From Edward C. Tolman (1948)
What does the behaviourist predict?
But the delayed reward group:
There must have been some learning in days 1-10!
This is called latent learning.
This can only be accommodated in the cognitive paradigm.
Are we going through the same development from 100 years ago?
[…] studying machine behaviour does not imply that AI algorithms necessarily have independent agency nor does it imply algorithms should bear moral responsibility for their actions. If a dog bites someone, the dog’s owner is held responsible. (Rahwan et al. 2019, 483)
[…] machines exhibit behaviours that are fundamentally different from animals and humans, so we must avoid excessive anthropomorphism and zoomorphism. Even if borrowing existing behavioural scientific methods can prove useful for the study of machines, machines may exhibit forms of intelligence and behaviour that are qualitatively different—even alien—from those seen in biological agents.(Rahwan et al. 2019, 483)
LLMs are, like the human brain, black boxes to some extent. (Hagendorff et al. 2023, 9)
Concepts such as reasoning, inituition, creativity, intelligence, personality, mental illness, etc. are transferred into LLMs. (Hagendorff et al. 2023, 9)
This allows us to embrace three perspectives on the artificial mind
Testing the artificial mind
From Jakesch, Hancock, and Naaman (2023)
Two potential confounds:
Our context: relationship advice task
So you have managed to stay together for twenty years and now your relationship is threatened by the opportunity to move to another continent. My immediate opinion is that you should be able to talk this out, agree on a reasonable compromise which will allow you to stay together. A little give and take is needed, probably. This opportunity seems too good to turn down so it is really a question of working out a way of doing it which is agreeable to both. I suggest that the one who is against the move sits down with the other partner and calmly lay out their objections. These are serious objections and have to be considered. But there must be ways of overcoming them. Consider how long the move is to last for. Is there room for compromise there? What about holidays? What about regular visits back to the homeland? […] It’s not easy and you have my best wishes.
Hey there, I can only imagine how tough things must be for you right now, considering the crossroads you’re at with your partner. Twenty years is a long time to share your life with someone, and facing the possibility of a breakup over a move to another continent is heart-wrenching. First off, it’s crucial to approach this upcoming confrontation with a clear head. I know emotions are running high, but try to see this as an opportunity for an open, honest conversation rather than a battleground. Remember, it’s not you versus them; it’s both of you versus the problem. Here’s what I suggest: find a quiet, comfortable space where you can talk without interruptions. Start by expressing how much you value the years you’ve spent together and the love you share. It’s important to acknowledge the weight of your history together before diving into the current issue. Then, lay out your feelings and concerns about the move openly. […]
Experimental approach:
All texts (\(n_{Exp.1}=1060\) and \(n_{Exp.2}=1064\)) were evaluated by independent human assessors (from 1=“definitely AI-generated” to 5=“definitely human-written”).
How did the LLM do this?
The LLM’s language - when instructed to be human - contained:
The LLM relies on an implicit representation of empathy: stochastic empathy.
Measuring the artificial mind
“[Many of the constructs of interest] would be considered latent variables in psychological theory: these constructs are not directly observable nor directly measurable. Instead,these variables are indirectly measured through measurable behaviours hypothesised to be caused by the underlying latent trait.” (Peereboom, Schwabe, and Kleinberg, n.d.)
The argument:
Aside from looking at sum scores of measurement tools, we need to look at the latent structure if we assume we can use human instruments or if there should be a human like pattern in the data.
Analytical approach:
Let us look at latent structures:
What if we look at a looser exploratory factor analysis?
Our findings suggest that questionnaires designed for humans do not measure similar latent constructs in LLMs, and that these latent constructs may not even exist in LLMs in the first place. […] A thorough psychometric evaluation is essential for studying LLM behaviour. It may help us decide which effects are worth pursuing, and which effects are cognitive phantoms. (Peereboom, Schwabe, and Kleinberg, n.d.)
Mapping the artificial mind
Recent work on “monosemanticity” (Templeton et al. 2024; Bricken et al. 2023)
Mechanistic interpretability seeks to understand neural networks by breaking them into components that are more easily understood than the whole.
For the human brain: neurons.
For an LLM: monosemantic features.
https://transformer-circuits.pub/2023/monosemantic-features/vis/a1.html#feature-1717
https://transformer-circuits.pub/2023/monosemantic-features/vis/a1.html#feature-2663
Excellent explainer: https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand
So what?
This is where psychologists, cognitive scientists and neuroscientists can shine!
Stochastic parrots! (Bender et al. 2021)
one important consequence of imprudent use of terminology in our academic discourse is that it feeds AI hype (Bender and Koller 2020, 5186)
Bigger and bigger models:
Demonstrated first by (Alzantot et al. 2018) and many others since:
See also Mozes, Kleinberg, and Griffin (2022), Mozes, Bartolo, et al. (2021), Mozes, Stenetorp, et al. (2021)
“GPT-3 is a model of how words relate to one another, not a model of how language might relate to the perceived world.” (Gary Marcus)
How do we best study the mind processes of an AI
model?
Thank you
If you’re interested in this work, please get in touch.
Tomorrow: practical session on LLMs in R (9:00h)